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CLAIMS 

What is claimed is: 

^ /. A method of populating a data structure with a plurality of character 

CX^ ' strings, said method comprising: \ 
^5 i) encoding kvo or more a biological molecules into character strings 

to provide a collection of two or more different initial character strings wherein each of said 
biological molecules comprises at least about 10 subunits; 

ii) selecting at least two substrings from said character strings; 

iii) concatenating said substrings to form one or more product strings 
1 0 about the same length as one or more oAthe initial character strings; 

5 iv) adding the product strings to a collection of strings; and 

v) optionally repeating steps (i) or (ii) through (iv) using one or more 
, = of said product strings as an initial string in\he collection of initial character strings. 

2. The method of claim 1 , wherein said encoding comprises encoding 
1 5 one or more nucleic acid sequences into said character strings. 
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3 . The methodW claim 2, wherein said one or more nucleic acid 
sequences comprise a nucleic acid sequence encoding a known protein. 

4. The method of clartn 1 , wherein said encoding comprises encoding 
one or more amino acid sequences into said\;haracter strings. 

5 . The method of claim 4, wherein said one or more amino acid 
sequences comprise a nucleic acid sequence encoding a known protein. 



6. The meth\d of claim 1 , wherein said biological molecules have at 
least 30% sequence identity. 

7. The method of clairhl , wherein said selecting comprises selecting 
substrings such that the ends of said substrings occur in string regions of about 3 to about 20 
characters that have higher sequence identity withthe corresponding region of another of 
said initial character strings than the overall sequence identity between the same two strings. 
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The method of claim 1, wherein said selecting comprises selecting 
substrings suchVhat the ends of said substrings occur in predefined motifs of about 4 to about 
8 characters. 

9. \ The method of claim 1, wherein said selecting and concatenating 
5 comprises concatenating substrings from two different initial strings such that the 

concatenation occurs in\ region of about three to about twenty characters having higher 
sequence identity betweemsaid two different initial strings than the overall sequence identity 
between said two different mitial strings. 

10. The method of claim 1, wherein said selecting comprises aligning two 
1 0 or more of said initial character springs to maximize pairwise identity between two or more 

substrings of the character strings, ^and selecting a character that is a member of an aligned 
pair for the end of one substring. 

1 1 . The method of claim 1 , wherein said product strings are added to the 
collection only if they have greater than 3^% sequence identity with the initial strings. 

15 12. The method of claim l\ wherein said method further comprises 

randomly altering one or more characters of said character strings. 

1 3 . The method of claim 1 2, wherein said method further comprises 
randomly selecting and altering one or more occurrences of a particular preselected character 
in said character strings. 

20 14. The method of claim 1, wherein sa^d coding, selecting, or 

concatenating is performed on an internet site. 

"^5^?^ 15. The method oV claim 1, wherein said coding, selecting, or 
concatenating is performed on a server. The method of claim 1, wherein said coding, 
selecting, or concatenating is performed on a server. 

25 16. The method of dlaim 1, wherein said coding, selecting, or 

concatenating is performed on a server! The method of claim 1, wherein said coding, 
selecting, or concatenating is performedVjn a client linked to a network.. 
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VK^* A computer program product comprising computer code that 

i encodes two or more a biological molecules into character strings to 
provide a collection of tWo or more different initial character strings wherein each of said 
biological molecules comprises at least about ten subunits; 
5 ii) selects at least two substrings from said character strings; 

iii) concatenates said substrings to form one or more product strings 
about the same length as one or more of the initial character strings; 

iv) adds the product strings to a collection of strings; and 

v) optionalMrepeats steps (i) or (ii) through (iv) using one or more of 
10 said product strings as an initial stringin the collection of initial character strings. 

1 8 . The program of claim 1 7, wherein said two or more biological 
molecules are nucleic acid sequences. 

1 9. The program of claim\l7, wherein said two or more biological 
molecules are nucleic acid sequences of knowX proteins. 

1 5 20. The program of claim 1 7, ^herein said two or more biological 

molecules are amino acid sequences 

2 1 . The program of claim 1 7, wherein said biological molecules have at 
least 30% sequence identity. 

22. The program of claim 17, wherein sW code selects substrings such 
20 that the ends of said substrings occur in string regions of about three to about twenty 

characters that have higher sequence identity with the correspWding region of another of 
said initial character strings than the overall sequence identity between the same two strings. 

23. The program of claim 17, wherein said codesselects substrings such 
that the ends of said substrings occur in predefined motifs of about 4\W about 8 characters. 

25 24. The program of claim 17, wherein said code selects and concatenates 

substrings from two different initial strings such that the concatenation occurs in a region of 
about three to about twenty characters having higher sequence identity between said two 
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different initial strings than the overall sequence identity between said two different initial 
strings. 

25. The program of claHm 17, wherein code selects substrings by aligning 
two or more of said initial character strings to maximize pairwise identity between two or 
more substrings of the character strings, and selecting a character that is a member of an 
aligned pair for the end of one substring. 

26. The program of claim 17, wherein said product strings are added to 
the collection only if they have greater than 30% identity with the initial strings. 



27. The program of claim 17, wherein said method further comprises 
randomly altering one or more characters of said character strings. 

28. The program of claim 27, wherein said method further comprises 
randomly selecting and altering one or more occurrences of a particular preselected character 
in said character strings. \ 

29. The proVam claim 17, wherein said code is stored on media selected 
from the group consisting of magnetic media, optical media, optomagnetic media. 

30. The program claim 17, wherein said code is in dynamic or static 
memory of a computer. 

y(f A label generating system for creating a plurality of related labels, said 
labeling system comprising: 

an encoder for encoiiingWo or more initial strings from biological 

molecules; 

an isolator for iderrtiry^pg an)i selecting substrings from said two or 
more strings; \ \ 

a concatenator for concatenating \aid substrings; 
a data structure for storing the concatenated substrings as a collection 

of strings; 
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^comparator for measuring the number and variability of the 
collection of strings and determining that sufficient strings exist in the collection of strings; 
and 

a comWnd writer for writing the collection of strings into a raw string 

5 file. 

32. The systeV of 3 1, wherein said isolator comprises a comparator for 
aligning and determining regions V identity between said two or more initial strings; 

33. The system oV 3 1 , wherein said encoder comprises a means for 
encoding a nucleic acid sequence inta a character string. 

10 34. The systern^OiUvhefemsaid encoder comprises a means for 

encoding an amino acid sequence into a character s 



35. The system 
for calculating sequence identity. 



plfirnttl, wherein said comparator comprises a means 



36. The system of claim 3 1, wherein said isolator selects substrings such 
that the ends of said substrings occur in string regions of about three to about 100 characters 
that have higher sequence identity with the corresponding region of another of said initial 
character strings than the overall sequence identity Between the same two strings. 

37. The system of claim 3 1 , wherein said isolator selects substrings such 
that the ends of said substrings occur in predefined motifs of about 4 to about 8 characters. 

3 8 . The system of claim 3 1 , wherein saia isolator and concatenator 
individually or in combination concatenate substrings from fyo different initial strings such 
that the concatenation occurs in a region of about three to about 100 characters having higher 
sequence identity between said two different initial strings thanlhe overall sequence identity 
between said two different initial strings. 

39. The system of claim 31, wherein said isolator ^ligns two or more of 
said initial character strings to maximize pairwise identity between tw\ or more substrings of 
the character strings, and selecting a character that is a member of an aligned pair for the end 
of one substring. 

-53- 



40. The system of clbim 3 1 , wherein said comparator adds strings to said 
data structure only if they have greater than 30% identity with the initial strings. 

4 1 . The system of claim 3 1 , further comprising an operator to randomly 
altering one or more characters of the charact 



42. The system of cm\ 
alters one or more occurrences of a partii 



4L wherein said operator randomly selects and 
fed character in said character strings. 



43. The system of clain i 3 1 , wherein data structure is a data structure that 
stores encoded nucleic acid sequences. 



44. The system of claim 3 1 , wherein data structure is a data structure that 
10 stores encoded amino acid sequences. 
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